The Information-Theoretic Requirements of Subspace Clustering with Missing Data
نویسندگان
چکیده
Subspace clustering with missing data (SCMD) is a useful tool for analyzing incomplete datasets. Let d be the ambient dimension, and r the dimension of the subspaces. Existing theory shows that Nk = O(rd) columns per subspace are necessary for SCMD, andNk = O(min{d , d}) are sufficient. We close this gap, showing that Nk = O(rd) is also sufficient. To do this we derive deterministic sampling conditions for SCMD, which give precise information-theoretic requirements and determine sampling regimes. These results explain the performance of SCMD algorithms from the literature. Finally, we give a practical algorithm to certify the output of any SCMD method deterministically.
منابع مشابه
Subspace Clustering with Missing Data
1 Subspace clustering with missing data can be seen as the combination of subspace clustering and low rank matrix completion, which is essentially equivalent to high-rank matrix completion under the assumption that columns of the matrix X ∈ Rd×N belong to a union of subspaces. It’s a challenging problem, both in terms of computation and inference. In this report, we study two efficient algorith...
متن کاملNGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map
Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...
متن کاملMissing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملFrom subspace clustering to full-rank matrix completion
Subspace clustering is the problem of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This type of structure occurs naturally in many applications ranging from bioinformatics, image/text clustering to semi-supervised learning. The companion paper [3] shows that robust and tractable subspace clustering is possible with minimal re...
متن کاملUsing Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council
Supervised clustering is a data mining technique that assigns a set of data to predefined classes by analyzing dataset attributes. It is considered as an important technique for information retrieval, management, and mining in information systems. Since customer satisfaction is the main goal of organizations in modern society, to meet the requirements, 137 call center of Tehran city council is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016